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ABSTRACT 

Web services application programming interface 
(API) was developed to provide a programmatic 
access to the regulatory interactions accumulated 
in the RegPrecise database (http://regprecise.lbl 
.gov), a core resource on transcriptional regulation 
for the microbial domain of the Department of 
Energy (DOE) Systems Biology Knowledgebase. 
RegPrecise captures and visualize regulogs, sets 
of genes controlled by orthologous regulators in 
several closely related bacterial genomes, that 
were reconstructed by comparative genomics. The 
current release of RegPrecise 2.0 includes >1400 
regulogs controlled either by protein transcription 
factors or by conserved ribonucleic acid regulatory 
motifs in >250 genomes from 24 taxonomic groups 
of bacteria. The reference regulons accumulated in 
RegPrecise can serve as a basis for automatic an- 
notation of regulatory interactions in newly 
sequenced genomes. The developed API provides 
an efficient access to the RegPrecise data by a com- 
prehensive set of 14 web service resources. The 
RegPrecise web services API is freely accessible 
at http://regprecise.lbl.gov/RegPrecise/services.jsp 
with no login requirements. 

INTRODUCTION 

Genome-scale transcriptional regulatory network (TRN) 
for any specific microbial organism is a critical component 
on the way to the next big milestone in System Biology — 
building an integrated metabolic and regulatory model 
that can accurately predict cellular growth phenotypes. 



Several approaches and associated web resources have 
been developed for genome-wide reconstruction of 
metabolic pathways for a number of microbial genomes. 
BioCyc maintains an encyclopedia of experimentally 
defined metabolic pathways in model organisms and 
uses it for reconstruction of metabolism in other seque- 
nced genomes (1). Model SEED allows user to submit any 
complete genome to the annotation pipeline to generate a 
draft metabolic model (2). 

At the same time, the large-scale TRNs so far are 
available only for a limited number of model organisms, 
such as Escherichia coli (3), Bacillus subtilis (4), Cor- 
ynebacterium glutamicum (5) and Mycobacterium tu- 
berculosis (6). A fundamental difference between the 
reconstruction of TRNs and metabolic pathways is that 
regulatory interactions are much less conserved across 
bacterial genomes than metabolic pathways (7). To 
address the challenging problem of propagation of regu- 
latory interactions between distant genomes, we recently 
developed a strategy enabling the genomic reconstruction 
of large-scale TRNs in diverse microbial genomes (8). The 
bacterial species tree is subdivided into small taxonomic 
families, and a subset of 5-12 representative genomes is 
selected for each family; then the comparative genomics 
techniques are used to reconstruct a collection of reference 
regulons in these genomes. At the next stage, these refer- 
ence regulons are used for an automatic annotation of 
regulatory interactions in the remaining genomes from 
the same taxonomic group. 

In 2009-10, with a goal of building collections of 
taxonomy-specific reference regulons, we developed two 
web resources for large-scale inference and analysis of 
regulatory interactions in prokaryotes: the RegPrecise 
database (http://regprecise.lbl.gov) to capture and visual- 
ize manually curated regulons (9) and the RegPredict web 
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server (http://regpredict.lbl.gov) for fast and accurate 
regulon reconstruction (8). 

The community-based approach for regulon reconstruc- 
tion implemented in the RegPredict web server enabled 
fast accumulation of a large number of curated regulatory 
interactions (10-13). The current release of RegPrecise 2.0 
captures the detailed descriptions of ~1000 regulogs 
controlled by protein transcription factors (TFs) in 13 
taxonomic groups of bacteria (147 genomes total) and 
~400 regulogs controlled of conserved ribonucleic acid 
(RNA) regulatory motifs (e.g. riboswitches) in 22 taxo- 
nomic groups (255 genomes total). 

The total number of effectors of analyzed regulators in 
RegPrecise exceeds 200 and includes the following major 
classes of metabolites: amino acids, carbohydrates, 
nucleotides, lipids and fatty acids, co-enzymes, peptides 
and antibiotics, secondary metabolites and inor- 
ganic chemicals. Beside TF regulons, the last release of 
RegPrecise includes a large collection of curated 
regulons controlled by RNA regulatory elements (such 
as riboswitches) annotated by the RegPredict web server 
using RNA models from the Rfam database (14). We an- 
ticipate that the content of RegPrecise database will 
continue fast growing due to large-scale regulon annota- 
tion projects conducted by scientific community using 
RegPredict. 

RegPrecise serves as a core resource on transcriptional 
regulation for the microbial domain of the DOE Systems 
Biology Knowledgebase (15), a community-driven cyber 
infrastructure for sharing and integrating data and analyt- 
ical tools to accelerate predictive biology. In the context of 
this project, we developed a comprehensive set of RESTful 
web services for programmatic access to the whole scope 
of transcriptional regulatory data provided by RegPrecise. 



DATA TYPES AND ORGANIZATION 

The hierarchical organization of the RegPrecise database 
has three major levels: i) a regulon, ii) a regulog and iii) a 
collection of regulogs. A regulon is the basic unit of 
RegPrecise that represents a set of genes in a particular 
genome that are co-regulated by the same TF or RNA 
regulatory motif. 

The major types of output in the RegPrecise web 
services application programming interface (API) are 
represented by six objects that provide the detailed infor- 
mation about reconstructed regulatory interactions 
(Figure 1). The Regulon object, being a basic unit, 
provides the general description of regulon, including the 
name of genome, the common name of regulator, its 
protein family and predicted effector molecule and the 
metabolic pathway (or biological process) controlled by 
the regulator. The regulationType property of the 
Regulon object identifies the type of mechanism of regu- 
lation, which is either by transcription factor (TF) or by 
RNA regulatory element (RNA). Three types of objects 
directly linked to Regulon include Regulator, Gene and 
Site. The Regulator object represents the actual gene 
encoding TF and provides its gene name, locus tag and 
vimssld [gene identifier in MicrobesOnline (16) database]. 



In rare cases, comparative genomics does not allow unam- 
biguous selection of true cognate regulator between 
several homologous TFs. In this case, more than one regu- 
lator can be assigned to a particular regulon. In the case of 
regulation by RNA elements, no Regulator objects 
assigned to a regulon. The Gene object represents a 
regulated gene and provides general information, such as 
gene name, locus tag, gene vimssld and function. The Site 
object represents either TF-binding site or RNA regula- 
tory element and provides its sequence, score, position 
relative to the translational gene start and downstream 
gene identifiers. 

The Regulog object combines several regulons, 
controlled by orthologous regulators, that were recon- 
structed in a set of closely related genomes. Similar to 
the Regulon object, the general information about regula- 
tor, effector and metabolic pathway is available for 
the Regulog object. In addition, the taxonName 
property describes the name of NCBI taxon representing 
all genomes analyzed in a given regulog. Finally, the 
RegulogCollection object represents the highest level in 
the RegPrecise hierarchical data organization. There are 
six types of collections available in RegPrecise: by 
taxonomy, orthologous TF, TF family, RNA regulatory 
family, effector molecule of a regulator and regulated 
metabolic pathway. The collectionType property of the 
RegulogCollection object encodes the type of collection 
and can possess one of the following values: 'taxGroup', 
'tf , 'tfFam', 'rnaFam', 'effector' and 'pathway'. Several 
types of collections, such as collection by effector and by 
metabolic pathway, have two-level hierarchical structure, 
and thus, the className property representing the upper 
level is provided in addition to the collection name. It 
should be noted that one Regulog can be assigned to 
several regulog collections. 



WEB SERVICES API 

The developed web services API enables programmatic 
access to the whole content of the RegPrecise database. 
The API is implemented as a set of RESTful web services 
providing data in either JavaScript Object Notation 
(JSON) or Extensible Markup Language (XML), two of 
the most popular formats. The base Uniform Resource 
Locator (URL) for all web services is http://regprecise. 
lbl.gov/Services/rest/. 

The current version of RegPrecise web services API 
includes 14 resources that can be classified into four 
categories (Table 1). The core resources provide access 
to the regulators (genes encoding either TFs or RNA 
motifs), target regulated genes and sites (either TF- 
binding sites or RNA regulatory elements). These three 
resources together provide complete information about 
the content of a particular regulon and can be queried 
by regulon identifier. At the same time, if a user is inter- 
ested in all genes regulated by orthologs of a particular TF 
in a group of closely related genomes, the genes resource 
can be queried by an identifier of a corresponding regulog. 
The same is true for two other resources. The regulog 
and regulon resources can be used to obtain summary 



W606 Nucleic Acids Research, 2012, Vol. 40, Web Server issue 



Collection types 

Taxonomy T 

□ TFOrtholog T 

I TFFamily T 

| Riboswitch t 

3] Effector T 



Pathway 



3) Collection2Regulog t 

collectionld 
regulogld 



Output objects in RegPrecise web services API 1 .0 

~Zi RegulogCollection T 

collectionld 
collectionType 
name 

className 



Regulog 

regulogld 

regulationType 

regulatorName 

regulatorFamily 

taxonName 

effector 

pathway 




~Z} DatabaseRelease 

majorVersion 
mionrVersion 
releaseDate 



_| Regulon 

regulonld 
regulogld 
regulationType 
regulatorName 
regulatorFamily 
genomeld 
genomeName 
effector 
pathway 



31 Genome 

genomeld 
name 

taxonomyld 



1/ 



\ 

\ 

\ 

\ 

\ 

\ 

\ 



~Z\ Regulator ▼ 

regulonld 
name 
locusTag 
vimssld 

regulatorFamily 

3) Gene 

regulonld 

name 

locusTag 

vimssld 

function 



regulonld 
sequence 
position 
score 

geneLocusTag 
geneVIMSSId 



Figure 1. The major output objects and their properties in RegPrecise web services API. 



Table 1. List of resources available in RegPrecise web services API 1.0 



Resources 



Description 



Core resources 

/regulators 

/genes 

/sites 

/regulog 

/regulon 
Navigation resources 

/regulogCollections 



/genomes 

/regulogs 

/regulons 
Statistics resources 

/regulogCollectionStats 

/genomeStats 
Utility resources 

/searchRegulons 

/searchExtRegulons 

/release 



Represents a list of analyzed regulators that belong to either a given regulon or regulog. 
Represents a list of regulated genes that belong to either a given regulon or regulog. 

Represents a list of TF-binding sites or RNA regulatory elements that belong to either a given regulon or regulog. 
Represents a regulog. 
Represents a regulon. 

Represents various types of regulog classifications: by taxonomic group, orthologous TF, TF family, 

RNA regulatory element, metabolic pathway or biological process, 

effector molecule or environmental signal. 
Represents a list of genomes that have at least one reconstructed regulon. 
Represents a list of regulogs in a collection of a given type and identifier. 
Represents a list of regulons in either a particular regulog or a genome. 

Represents general statistics on regulog collections of a particular type. 

Represents general statistics on genomes that have at least one reconstructed regulon. 

Search regulons by the name or locus tag of either regulator or target regulated genes. 
Search regulons by a genome and list of gene locus tags. 

Represents the version and the release date of the current version of RegPrecise. 



information about the intrinsic attributes of a regulog/ 
regulon, such as the associated metabolic pathway (or bio- 
logical process), effector molecules, the regulation mode 
(activation or repression), the regulation type (TF or 
RNA) and regulator family. 



The navigation resources allow traversing through 
the hierarchical organization of the RegPrecise data 
(Table 1). The list of the available regulog collections 
can be retrieved for each of these types using the 
regulogCollections resource. Subsequently, the list of 
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regulogs that belong to a particular collection can be 
obtained by the regulogs resource. It is important that 
the collection identifiers are unique only within the space 
of regulog collections of a particular type. Thus in 
addition to these identifiers, it is necessary to provide 
the type of a collection as a parameter in a query. 
Finally, a list of regulons analyzed for a particular TF 
of RNA motif in a set of closely related genomes can be 
accessed by the regulons resource given regulog identifier. 
The alternative navigation route starts from a list of 
genomes that have at least one reconstructed regulon. 
The complete list of genomes can be obtained by the 
genomes resource. The same regulons resource can be 
used to get a list of all reconstructed regulons for a par- 
ticular genome. 

To get an overview of the RegPrecise content, we 
developed two resources providing the general statistics 
for regulog collections (regulogCollectionStats resource) 
and genomes (genome Stats resource). The statistics 
includes the number of genomes, reconstructed regulogs, 
regulons, and regulatory sites for the TFs and RNA 
elements. 

Finally, the utility resources allow for searching for 
regulons by locus tags of the target regulated genes or 
TFs {searchRegulons resource). We also developed a 
special resource that can be used to analyze gene sets. 
The searchExt Regulons resource takes NCBI taxonomy 
id of a genome and comma-separated list of gene locus 
tags and returns the non-redundant list of regulons that 
contain at least one of the provided genes. In particular, 
this resource can be used for automatic validation of gene 
clusters reconstructed by the analysis of expression data. 
The current version of the RegPrecise database underlying 
the web services API can be obtained by the release 
resource. 

The complete documentation on the listed resources can 
be found at http://regprecise.lbl.gov/RegPrecise/services. 
jsp. Two types of client code examples are provided. 
First, we provide a template program in perl that can be 
run to access several of the web services and parse the 
output data. The program is organized in two perl 
scripts: (i) RegPreciseAdapter.pm — a perl module that 
provides access to the individual web services and (ii) 
regulons.pl — an example of workflow that can be imple- 
mented using a combination of several web services. Both 
scripts can be easily modified to access all the web services 
available. At the same time, each web service is accom- 
panied with an example of accessing API using cURL 
command-line tool. 



USE CASES 

In this study, we provide two examples of the possible 
scenarios of using RegPrecise web services API. 

Scenario 1 — obtaining information on a TRN available 
for a particular genome. To check the availability of regu- 
latory interactions for a given genome, the user can query 
the genomes resource. By analyzing the output for the 
presence of the NCBI taxonomy id of the genome of 
interest, the user can confirm the presence of this 



genome in RegPrecise. For instance, the NCBI 
taxonomy id of Shewanella baltica OS 155 is 
325240, which corresponds to the internal genome identi- 
fier genomeld = 7. The list of all reconstructed regulons 
in this genome can be retrieved by querying 
r egulons? genomeld = 7 . Analysis of the regulationType 
property of each regulon in the output will show that 63 
regulons are regulated by TFs, whereas 14 regulons are 
controlled by RNA regulatory elements. Among 
TF-operated regulons, the tryptophane regulon TrpR 
has regulonld = 6378. The list of TrpR-regulated genes 
can be retrieved by querying genesFregulonld = 6378. 

Scenario 2 — obtaining all genes that are regulated by a 
particular RNA motif in any genome. For example, the 
user is interested in genes regulated by FMN riboswitch. 
Analysis of the regulogCollections?collectionType = 
maFam query output identifies the corresponding regulog 
collection (collectionld = 25). List of all reconstructed 
regulogs in this collection can be retrieved by querying 
regulogsFcollectionType = rnaFam&collectionld = 25. 

Finally, by iterating through each regulog, the complete 
list of genes regulated by FMN riboswitch can be retrieved 
by genes resource by providing regulog identifier as a par- 
ameter, e.g. genes?regulog!d = 1450. 



FUTURE DEVELOPMENT 

We are currently working on the automatic conservative 
propagation of all regulons inferred in the reference set of 
genomes to all closely related genomes from the same 
taxonomic group. We will develop new web services to 
enable programmatic access to the results of propagation. 
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